50 research outputs found
An Algorithm for Projecting Onto the Ordered Weighted Norm Ball
The ordered weighted (OWL) norm is a newly developed generalization
of the Octogonal Shrinkage and Clustering Algorithm for Regression (OSCAR)
norm. This norm has desirable statistical properties and can be used to perform
simultaneous clustering and regression. In this paper, we show how to compute
the projection of an -dimensional vector onto the OWL norm ball in
operations. In addition, we illustrate the performance of our
algorithm on a synthetic regression test.Comment: 1 Figures, 1 table, 14 pages, Example added to appendi
Convergence rate analysis of primal-dual splitting schemes
Primal-dual splitting schemes are a class of powerful algorithms that solve
complicated monotone inclusions and convex optimization problems that are built
from many simpler pieces. They decompose problems that are built from sums,
linear compositions, and infimal convolutions of simple functions so that each
simple term is processed individually via proximal mappings, gradient mappings,
and multiplications by the linear maps. This leads to easily implementable and
highly parallelizable or distributed algorithms, which often obtain nearly
state-of-the-art performance. In this paper, we analyze a monotone inclusion
problem that captures a large class of primal-dual splittings as a special
case. We introduce a unifying scheme and use some abstract analysis of the
algorithm to prove convergence rates of the proximal point algorithm,
forward-backward splitting, Peaceman-Rachford splitting, and
forward-backward-forward splitting applied to the model problem. Our ergodic
convergence rates are deduced under variable metrics, stepsizes, and
relaxation. Our nonergodic convergence rates are the first shown in the
literature. Finally, we apply our results to a large class of primal-dual
algorithms that are a special case of our scheme and deduce their convergence
rates.Comment: 31 pages, 1 table
The Asynchronous PALM Algorithm for Nonsmooth Nonconvex Problems
We introduce the Asynchronous PALM algorithm, a new extension of the Proximal
Alternating Linearized Minimization (PALM) algorithm for solving nonsmooth,
nonconvex optimization problems. Like the PALM algorithm, each step of the
Asynchronous PALM algorithm updates a single block of coordinates; but unlike
the PALM algorithm, the Asynchronous PALM algorithm eliminates the need for
sequential updates that occur one after the other. Instead, our new algorithm
allows each of the coordinate blocks to be updated asynchronously and in any
order, which means that any number of computing cores can compute updates in
parallel without synchronizing their computations. In practice, this
asynchronization strategy often leads to speedups that increase linearly with
the number of computing cores.
We introduce two variants of the Asynchronous PALM algorithm, one stochastic
and one deterministic. In the stochastic \textit{and} deterministic cases, we
show that cluster points of the algorithm are stationary points. In the
deterministic case, we show that the algorithm converges globally whenever the
Kurdyka-{\L}ojasiewicz property holds for a function closely related to the
objective function, and we derive its convergence rate in a common special
case. Finally, we provide a concrete case in which our assumptions hold
Graphical Convergence of Subgradients in Nonconvex Optimization and Learning
We investigate the stochastic optimization problem of minimizing population
risk, where the loss defining the risk is assumed to be weakly convex.
Compositions of Lipschitz convex functions with smooth maps are the primary
examples of such losses. We analyze the estimation quality of such nonsmooth
and nonconvex problems by their sample average approximations. Our main results
establish dimension-dependent rates on subgradient estimation in full
generality and dimension-independent rates when the loss is a generalized
linear model. As an application of the developed techniques, we analyze the
nonsmooth landscape of a robust nonlinear regression problem.Comment: 36 page
Factorial and Noetherian Subrings of Power Series Rings
Let be a field. We show that certain subrings contained between the
polynomial ring and the power series ring have Weierstrass Factorization, which allows us to
deduce both unique factorization and the Noetherian property. These
intermediate subrings are obtained from elements of by bounding
their total -degree above by a positive real-valued monotonic up function
on their -degree. These rings arise naturally in studying -adic
analytic variation of zeta functions over finite fields. Future research into
this area may study more complicated subrings in which
has more than one variable, and for which there are multiple degree functions,
. Another direction of study would be to generalize
these results to -affinoid algebras.Comment: 13 page
Stochastic model-based minimization of weakly convex functions
We consider a family of algorithms that successively sample and minimize
simple stochastic models of the objective function. We show that under
reasonable conditions on approximation quality and regularity of the models,
any such algorithm drives a natural stationarity measure to zero at the rate
. As a consequence, we obtain the first complexity guarantees for
the stochastic proximal point, proximal subgradient, and regularized
Gauss-Newton methods for minimizing compositions of convex functions with
smooth maps. The guiding principle, underlying the complexity guarantees, is
that all algorithms under consideration can be interpreted as approximate
descent methods on an implicit smoothing of the problem, given by the Moreau
envelope. Specializing to classical circumstances, we obtain the long-sought
convergence rate of the stochastic projected gradient method, without batching,
for minimizing a smooth function on a closed convex set.Comment: 33 pages, 4 figure
Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions
Splitting schemes are a class of powerful algorithms that solve complicated
monotone inclusion and convex optimization problems that are built from many
simpler pieces. They give rise to algorithms in which the simple pieces of the
decomposition are processed individually. This leads to easily implementable
and highly parallelizable algorithms, which often obtain nearly
state-of-the-art performance.
In this paper, we provide a comprehensive convergence rate analysis of the
Douglas-Rachford splitting (DRS), Peaceman-Rachford splitting (PRS), and
alternating direction method of multipliers (ADMM) algorithms under various
regularity assumptions including strong convexity, Lipschitz differentiability,
and bounded linear regularity. The main consequence of this work is that
relaxed PRS and ADMM automatically adapt to the regularity of the problem and
achieve convergence rates that improve upon the (tight) worst-case rates that
hold in the absence of such regularity. All of the results are obtained using
simple techniques.Comment: 40 pages, 3 table
A Three-Operator Splitting Scheme and its Optimization Applications
Operator splitting schemes have been successfully used in computational
sciences to reduce complex problems into a series of simpler subproblems. Since
1950s, these schemes have been widely used to solve problems in PDE and
control. Recently, large-scale optimization problems in machine learning,
signal processing, and imaging have created a resurgence of interest in
operator-splitting based algorithms because they often have simple
descriptions, are easy to code, and have (nearly) state-of-the-art performance
for large-scale optimization problems. Although operator splitting techniques
were introduced over 60 years ago, their importance has significantly increased
in the past decade.
This paper introduces a new operator-splitting scheme for solving a variety
of problems that are reduced to a monotone inclusion of three operators, one of
which is cocoercive. Our scheme is very simple, and it does not reduce to any
existing splitting schemes. Our scheme recovers the existing forward-backward,
Douglas-Rachford, and forward-Douglas-Rachford splitting schemes as special
cases.
Our new splitting scheme leads to a set of new and simple algorithms for a
variety of other problems, including the 3-set split feasibility problems,
3-objective minimization problems, and doubly and multiple regularization
problems, as well as the simplest extension of the classic ADMM from 2 to 3
blocks of variables. In addition to the basic scheme, we introduce several
modifications and enhancements that can improve the convergence rate in
practice, including an acceleration that achieves the optimal rate of
convergence for strongly monotone inclusions. Finally, we evaluate the
algorithm on several applications.Comment: 52 pages, 5 figure
Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems
In this paper, we introduce a stochastic projected subgradient method for
weakly convex (i.e., uniformly prox-regular) nonsmooth, nonconvex functions---a
wide class of functions which includes the additive and convex composite
classes. At a high-level, the method is an inexact proximal point iteration in
which the strongly convex proximal subproblems are quickly solved with a
specialized stochastic projected subgradient method. The primary contribution
of this paper is a simple proof that the proposed algorithm converges at the
same rate as the stochastic gradient method for smooth nonconvex problems. This
result appears to be the first convergence rate analysis of a stochastic (or
even deterministic) subgradient method for the class of weakly convex
functions.Comment: Updated 9/17/2018: Major Revision -added high probability bounds,
improved convergence analysis in general, new experimental results. Updated
7/26/2017: Added references to introduction and a couple simple extensions as
Sections 3.2 and 4. Updated 8/23/2017: Added NSF acknowledgements. Updated
10/16/2017: Added experimental result
Convergence rate analysis of several splitting schemes
Splitting schemes are a class of powerful algorithms that solve complicated
monotone inclusions and convex optimization problems that are built from many
simpler pieces. They give rise to algorithms in which the simple pieces of the
decomposition are processed individually. This leads to easily implementable
and highly parallelizable algorithms, which often obtain nearly
state-of-the-art performance.
In the first part of this paper, we analyze the convergence rates of several
general splitting algorithms and provide examples to prove the tightness of our
results. The most general rates are proved for the \emph{fixed-point residual}
(FPR) of the Krasnosel'ski\u{i}-Mann (KM) iteration of nonexpansive operators,
where we improve the known big- rate to little-. We show the tightness of
this result and improve it in several special cases. In the second part of this
paper, we use the convergence rates derived for the KM iteration to analyze the
\emph{objective error} convergence rates for the Douglas-Rachford (DRS),
Peaceman-Rachford (PRS), and ADMM splitting algorithms under general convexity
assumptions. We show, by way of example, that the rates obtained for these
algorithms are tight in all cases and obtain the surprising statement: The DRS
algorithm is nearly as fast as the proximal point algorithm (PPA) in the
ergodic sense and nearly as slow as the subgradient method in the nonergodic
sense. Finally, we provide several applications of our result to feasibility
problems, model fitting, and distributed optimization. Our analysis is
self-contained, and most results are deduced from a basic lemma that derives
convergence rates for summable sequences, a simple diagram that decomposes each
relaxed PRS iteration, and fundamental inequalities that relate the FPR to
objective error.Comment: 45 pages; 3 figures; added convergence rate analysis of the inexact
version of the KM algorith